[refact](udf) remove the udf cache expiration_time property#63897
Conversation
Issue Number: close #xxx <!--Describe your changes.-->
## Proposed changes Issue Number: close #xxx <!--Describe your changes.-->
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
2250be3 to
a1f5113
Compare
There was a problem hiding this comment.
Summary: I found two correctness issues in the new static UDF classloader cache behavior. The main blocker is a first-use race where one executor can close another executor's live URLClassLoader, preserving the NoClassDefFoundError class of failure this PR is trying to eliminate. There is also a regression for static-load UDFs loaded through the system classloader, where a null classLoader is valid but now treated as a cache miss.
Critical checkpoints:
- Goal/test: The PR aims to stop time-based UDF classloader eviction from breaking static-load Java UDFs. The goal is only partially met; concurrent first use can still close a live loader, and system-classloader static UDFs are not cached effectively. I did not find tests covering these concurrency/system-loader paths.
- Scope: The change is focused, but the cache lifecycle semantics changed from synchronized ExpiringMap operations to ConcurrentHashMap replacement without atomic construction.
- Concurrency: The modified static cache is shared by concurrent Java UDF executors and BE clean-cache tasks. The cache miss/build/put path is not atomic, which creates the live-loader close race noted inline.
- Lifecycle: UdfClassCache.classLoader may intentionally be null for system-classloader UDFs; the new validity check does not preserve that lifecycle invariant.
- Configuration/compatibility: expiration_time remains accepted and serialized but is now ignored; this is a user-visible semantic change and should be documented or removed in a coordinated way.
- Parallel paths: DROP FUNCTION cleanup still exists through FE clean-cache tasks and BE JNI cleanup; static-load lookup is the affected path.
- Testing: No new tests were included for concurrent static-load first use, DROP/reload lifecycle, or empty jarPath/system-classloader static UDFs.
- Observability/performance: No additional observability is required for the core issue, but repeated rebuilding for system-classloader static UDFs is avoidable overhead.
- Data/transaction/persistence: Not applicable to data visibility or transaction persistence.
User focus: No additional user-provided review focus was specified.
|
run buildall |
TPC-H: Total hot run time: 31603 ms |
TPC-DS: Total hot run time: 172206 ms |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
Summary: I found one remaining correctness issue in the static UDF cache lifecycle. The latest changes fix the previously raised null-classloader reuse and concurrent first-use replacement issues, but DROP cleanup can still race with a first-time load and leave a stale cache entry published after the cleanup has already run.
Critical checkpoints:
- Goal/test: The PR aims to stop time-based static UDF classloader eviction from breaking long-running Java UDF queries. The main eviction issue is addressed, but the new drop-only lifecycle is incomplete because cleanup is not coordinated with concurrent first-time cache publication. I did not find a test covering DROP concurrent with first static-load initialization followed by recreate/reuse.
- Scope: The changed surface is focused on UDF classloader cache lifecycle.
- Concurrency: The modified cache is shared by executor initialization threads and BE clean-cache tasks. The current
removeand laterputIfAbsentrace can resurrect a dropped function cache. - Lifecycle: Removing time-based eviction makes explicit DROP cleanup the only lifecycle boundary, so a missed cleanup now leaves stale class metadata and a
URLClassLoaderindefinitely. - Configuration/compatibility:
expiration_timeremains accepted and serialized but is intentionally ignored; this is user-visible and should be covered by the linked documentation. - Parallel paths: The BE clean-cache task still drops downloaded library files by function id, but Java class metadata cleanup is keyed only by signature, which is the affected path.
- Testing: No new regression/unit test is included for the new DROP-only lifecycle or the cleanup-vs-load race.
- Observability/performance: No additional observability issue found beyond the stale cache behavior.
- Data/transaction/persistence: Not applicable to table data visibility or transaction persistence.
- User focus: No additional user-provided review focus was specified.
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
### What problem does this PR solve? Problem Summary: doc apache/doris-website#3845 ``` CREATE FUNCTION print_12() RETURNS int PROPERTIES ( "file" = "file:///path/to/java-udf-demo-jar-with-dependencies.jar", "symbol" = "org.apache.doris.udf.Print", "always_nullable"="true", "type" = "JAVA_UDF", "static_load" = "true", // default value is false "expiration_time" = "60" // default value is 360 minutes ); ``` ``` before in the java-udf could use static_load and expiration_time to control the cache jar times in BE. which use a backgroud thread to scan the jars every ten minutes, check it's init times, and then drop it if time expire. those will cause some long running query failed when the backgroud thread remove it. Now, remove the expiration_time, and the jar will be clean when drop fucntion immediately ```
What problem does this PR solve?
Problem Summary:
doc apache/doris-website#3845
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)